Trendings in Spanish-American "Top 50 tracks" Playlists
Spotify, which includes more than 16 million tracks and podcasts, is one of the wolrdwide most used music streaming platforms. This makes it a very relevant reference to understands the trends of what happen over time. Therefore, in this notebook we are going to make an in-depth analysis of spanish-american trending tracks up to November 14, 2024. The used dataset contains tracks from "Top-50" playlists of 17 different spanish-american countries: Colombia, Mexico, Spain, Argentina, Venezuela, Chile, Ecuador, Dominican Republic, Peru, Panama, Uruguay, Paraguay, Bolivia, Costa Rica, Guatemala, Honduras, and El Salvador.
The data was obtained from the Spotify API.
The data was obtained from the Spotify API.
Import libraries
In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots
import seaborn as sns
import datetime as dt
pio.renderers.default = 'notebook'
Loading Data
In [18]:
pd.set_option('display.max_columns',None)
tracks_df = pd.read_excel('top_tracks_latin_playlists.xlsx',sheet_name='tracks')
tracks_df.head()
Out[18]:
| id | name | explicit | release_date | artist | popularity | playlist | danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms | time_signature | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2btNsI4OvcVl7SAHQQDHFB | Mirame | True | 2024-04-17 | Blessd | 87 | Top-50-Colombia | 0.717 | 0.656 | 7 | -4.449 | 1 | 0.0797 | 0.141 | 0.000030 | 0.0661 | 0.695 | 175.956 | 157453 | 4 |
| 1 | 6WatFBLVB0x077xWeoVc2k | Si Antes Te Hubiera Conocido | False | 2024-06-21 | KAROL G | 95 | Top-50-Colombia | 0.924 | 0.668 | 11 | -6.795 | 1 | 0.0469 | 0.446 | 0.000594 | 0.0678 | 0.787 | 128.027 | 195824 | 4 |
| 2 | 13BDiikG6y5o5cQTK0HpW6 | Soltera - W Sound 01 | True | 2024-08-06 | W Sound | 79 | Top-50-Colombia | 0.734 | 0.578 | 1 | -4.147 | 1 | 0.2950 | 0.155 | 0.000242 | 0.1130 | 0.880 | 199.997 | 142022 | 4 |
| 3 | 7bywjHOc0wSjGGbj04XbVi | LUNA | False | 2023-12-01 | Feid | 89 | Top-50-Colombia | 0.774 | 0.860 | 7 | -2.888 | 0 | 0.1300 | 0.131 | 0.000000 | 0.1160 | 0.446 | 100.019 | 196800 | 4 |
| 4 | 5QjmUqgpPQgXgg4606DqZF | UWAIE | False | 2024-08-15 | Kapo | 88 | Top-50-Colombia | 0.705 | 0.783 | 9 | -4.783 | 0 | 0.0403 | 0.138 | 0.000000 | 0.0984 | 0.454 | 103.001 | 172427 | 4 |
Data Information
In [19]:
tracks_df.shape
#(Rows, columns)
Out[19]:
(849, 20)
In [20]:
tracks_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 849 entries, 0 to 848 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 849 non-null object 1 name 849 non-null object 2 explicit 849 non-null bool 3 release_date 849 non-null object 4 artist 849 non-null object 5 popularity 849 non-null int64 6 playlist 849 non-null object 7 danceability 849 non-null float64 8 energy 849 non-null float64 9 key 849 non-null int64 10 loudness 849 non-null float64 11 mode 849 non-null int64 12 speechiness 849 non-null float64 13 acousticness 849 non-null float64 14 instrumentalness 849 non-null float64 15 liveness 849 non-null float64 16 valence 849 non-null float64 17 tempo 849 non-null float64 18 duration_ms 849 non-null int64 19 time_signature 849 non-null int64 dtypes: bool(1), float64(9), int64(5), object(5) memory usage: 127.0+ KB
id: Track's unique identificator.
name: Track's name.
explicit: Wheter or not the tracks has explicit lyrics. True: yes, it has explicit lyrics. False: no, it does not have explicit lyrics
release_date: Track's first release date.
artist: Artist who performed in the track.
popularity: The popularity of the track.The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. 0 - 100, from least popular to most popular.
playlist: Playlist where the track is extracted.
danceability: How suitable the track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. 0 - 1, from least danceable to most danceable.
energy: perceptaul measure of intensity and activity (fast, loud, noisy). 0 - 1, from least energy to most energy.
key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
loudness: The overall loudness of a track in decibels (dB). Values typically range between -60 and 0 db.
mode: Track's modality (major or minor). Major is 1 and minor is 0.
speechiness: Presence of spoken words detected in the track. 0 - 1, from least speech-lik to most speech-like.
acousticness: Confidence measure whether the track is acoustic. Thye confidences ranges between 0 to 1 instrumentalness: Predicts whether the tracks contains no vocals. 0 - 1, The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
valence: Describes the musical positiveness conveyed by the track. 0 - 1, from least positive to most positive.
tempo: Track's overall beats per minute (BPM) estimated.
duration_ms: Track's duration in miliseconds.
time_signature: An estimated time signature. 1 - 5, from "3/4" to "7/4".
References:
https://developer.spotify.com/documentation/web-api/reference/get-playlist
https://developer.spotify.com/documentation/web-api/reference/get-audio-features
name: Track's name.
explicit: Wheter or not the tracks has explicit lyrics. True: yes, it has explicit lyrics. False: no, it does not have explicit lyrics
release_date: Track's first release date.
artist: Artist who performed in the track.
popularity: The popularity of the track.The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. 0 - 100, from least popular to most popular.
playlist: Playlist where the track is extracted.
danceability: How suitable the track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. 0 - 1, from least danceable to most danceable.
energy: perceptaul measure of intensity and activity (fast, loud, noisy). 0 - 1, from least energy to most energy.
key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
loudness: The overall loudness of a track in decibels (dB). Values typically range between -60 and 0 db.
mode: Track's modality (major or minor). Major is 1 and minor is 0.
speechiness: Presence of spoken words detected in the track. 0 - 1, from least speech-lik to most speech-like.
acousticness: Confidence measure whether the track is acoustic. Thye confidences ranges between 0 to 1 instrumentalness: Predicts whether the tracks contains no vocals. 0 - 1, The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
valence: Describes the musical positiveness conveyed by the track. 0 - 1, from least positive to most positive.
tempo: Track's overall beats per minute (BPM) estimated.
duration_ms: Track's duration in miliseconds.
time_signature: An estimated time signature. 1 - 5, from "3/4" to "7/4".
References:
https://developer.spotify.com/documentation/web-api/reference/get-playlist
https://developer.spotify.com/documentation/web-api/reference/get-audio-features
Exploration
In [21]:
tracks_df.isnull().sum()
Out[21]:
id 0 name 0 explicit 0 release_date 0 artist 0 popularity 0 playlist 0 danceability 0 energy 0 key 0 loudness 0 mode 0 speechiness 0 acousticness 0 instrumentalness 0 liveness 0 valence 0 tempo 0 duration_ms 0 time_signature 0 dtype: int64
In [22]:
tracks_df.describe()
Out[22]:
| popularity | danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms | time_signature | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 | 849.000000 |
| mean | 79.511190 | 0.740389 | 0.691458 | 5.931684 | -5.303485 | 0.415783 | 0.101737 | 0.252781 | 0.003004 | 0.176362 | 0.640787 | 118.520093 | 186919.494700 | 3.859835 |
| std | 10.023213 | 0.112127 | 0.116101 | 3.179392 | 1.914944 | 0.493147 | 0.084900 | 0.192217 | 0.021944 | 0.153093 | 0.184413 | 30.101113 | 38599.784036 | 0.432086 |
| min | 0.000000 | 0.373000 | 0.356000 | 0.000000 | -14.178000 | 0.000000 | 0.026600 | 0.000856 | 0.000000 | 0.039000 | 0.109000 | 53.376000 | 68364.000000 | 1.000000 |
| 25% | 77.000000 | 0.686000 | 0.611000 | 4.000000 | -6.304000 | 0.000000 | 0.046900 | 0.103000 | 0.000000 | 0.095700 | 0.494000 | 97.541000 | 158242.000000 | 4.000000 |
| 50% | 82.000000 | 0.753000 | 0.702000 | 6.000000 | -4.945000 | 0.000000 | 0.065400 | 0.187000 | 0.000000 | 0.121000 | 0.658000 | 104.977000 | 182293.000000 | 4.000000 |
| 75% | 85.000000 | 0.820000 | 0.767000 | 9.000000 | -4.089000 | 1.000000 | 0.129000 | 0.372000 | 0.000030 | 0.197000 | 0.791000 | 131.842000 | 207747.000000 | 4.000000 |
| max | 100.000000 | 0.952000 | 0.965000 | 11.000000 | 0.020000 | 1.000000 | 0.625000 | 0.897000 | 0.287000 | 0.949000 | 0.974000 | 214.047000 | 390545.000000 | 5.000000 |
In [23]:
tracks_df[tracks_df.select_dtypes(include=['bool']).columns] = tracks_df.select_dtypes(include=['bool']).astype(int)
numeric_tracks_df = tracks_df.select_dtypes(include=['number'])
correlation = numeric_tracks_df.corr(method='kendall').round(2)
mask = np.zeros_like(correlation, dtype = bool)
mask[np.triu_indices_from(mask)] = True
correlation_viz = correlation.mask(mask).dropna(how='all')
correlation_matrix = px.imshow(correlation_viz, text_auto= True, height=600, color_continuous_scale=['#ffffff','#1db954'], aspect='equal', title='<b>Correlation Between Columns')
correlation_matrix.update_layout(title_x=0.5, title_font_size = 24, margin_pad = 5, font_size = 10, font=dict(color='#535353'), yaxis=dict(range=[0,100]))
We can see by difference the highest correlation is between "Loudness" and "Energy" (0.4). The next two highest are between "Time Signature" and "Loudness" (0.23), and between "Danceability" and "Explicit" (0.21). We are going to take a look of this three correlations later
In [24]:
tracks_df['name_artist'] = tracks_df['name'] + ' - ' + tracks_df['artist']
top_popular_tracks_fig = px.treemap(tracks_df.drop_duplicates('name_artist').sort_values(by='popularity',ascending=False).head(20),path=[px.Constant('Tracks'),'name_artist'],values='popularity',color='popularity',color_continuous_scale=['#8cfab1','#1db954'])
top_popular_tracks_fig.update_traces(hovertemplate='<b>%{label}</b><br>Popularity: %{value}<extra></extra>',textinfo='label+value',textposition='middle center',textfont_color='#121212',marker=dict(cornerradius=5,line_color='#535353'))
top_popular_tracks_fig.update_layout(title='<b>Top 20 Popular Tracks</b>',title_x=0.5,font_color='#535353',title_font_size=24,hoverlabel_font_color='#535353',margin=dict(b=10))
In general, there is not much variability in trending track's popularity. Altought, the top four popular tracks are a little bit isolated from the rest, which means that have high difference in the total number the tracks have been recently played.
In [25]:
artist_popularity = tracks_df.groupby(by='artist',as_index=False).agg(average_popularity=('popularity','mean'),artist_tracks=('name','nunique')).sort_values(by='average_popularity',ascending=False).sort_values(by='artist_tracks', ascending=False).head(20)
popularity_fig = go.Figure(data=go.Bar(x=artist_popularity['artist'],y=artist_popularity['artist_tracks'], name= 'Artist Tracks', text=artist_popularity['artist_tracks'], marker=dict(color='#1db954'), hovertemplate='<b>Artist:</b> %{x}<br><b>Tracks:</b> %{y}<extra></extra>'))
popularity_fig.add_trace(go.Scatter(x=artist_popularity['artist'],y=artist_popularity['average_popularity'], name= 'Artists Average Popularity', text=artist_popularity['average_popularity'].round(1), textposition='top center', mode='lines+markers+text', line=dict(color='#535353'), hovertemplate='<b>Artist:</b> %{x}<br><b>Average Popularity:</b> %{y:.1f}<extra></extra>',yaxis='y2'))
popularity_fig.update_layout(title = "<b>Artist's Trending Tracks vs Average Popularity",title_x = 0.5, title_font_size = 24,font_color='#535353',xaxis=dict(title='<b>Top 20 Artists'),yaxis=dict(title=dict(text="<b># of Trending Tracks"),side="left"),yaxis2=dict(title=dict(text="<b>Average Popularity"),side="right",overlaying="y",tickmode="sync"),legend= dict(orientation='h',xanchor='center',x=0.5,yanchor='top',y=1.1))
Karol G has a high popularity (88.5) with just four trending tracks, clearly due to fact that she has a great recognition. While Engel Montaz, even if has seven trending tracks, has a lower popularity (50.3). This could be due to his tracks are placed in the lowest positions in the "Top-50" playlists.
In [26]:
explicit_general = tracks_df['explicit'].value_counts(normalize=True) * 100
explicit_general = explicit_general.set_axis(['Not Explicit','Explicit'])
trace_general_explicit = go.Pie(labels=explicit_general.index,values=explicit_general.values,marker=dict(colors=['#535353', '#1db954']),showlegend=False,textinfo='label+percent',hovertemplate='<b>%{label}</b><br>%{percent}<extra></extra>')
playlists_grouped = tracks_df.groupby('playlist')['explicit'].value_counts(normalize=True).unstack(fill_value=0).sort_values(by=0,ascending=False) * 100
playlists_grouped.columns = ['Not Explicit','Explicit']
trace_explicit = go.Bar(x=playlists_grouped['Explicit'],y=playlists_grouped.index,name='Explicit',text=playlists_grouped['Explicit'].apply(lambda x: f"{x:.0f}%"),insidetextanchor='middle',hovertemplate='<b>Explicit:</b> %{text}<extra></extra>',marker=dict(color='#1db954'),orientation='h', width=0.9)
trace_not_explicit = go.Bar(x=playlists_grouped['Not Explicit'],y=playlists_grouped.index,name='Not Explicit',text=playlists_grouped['Not Explicit'].apply(lambda x: f"{x:.0f}%"),insidetextanchor='middle',hovertemplate='<b>Not Explicit:</b> %{text}<extra></extra>',marker=dict(color='#535353'),orientation='h', width=0.9)
explicit_figs = make_subplots(rows=1,cols=2,specs=[[{'type': 'domain'}, {'type': 'xy'}]],subplot_titles=['<b>General Explicit vs Not Explicit Tracks','<b>Explicit vs Not Explicit Tracks by Playlist'],horizontal_spacing=0.2)
explicit_figs.add_trace(trace_general_explicit,row=1,col=1)
explicit_figs.add_trace(trace_explicit,row=1,col=2)
explicit_figs.add_trace(trace_not_explicit,row=1,col=2)
explicit_figs.update_layout(font_color='#535353',barmode='stack',xaxis=dict(range=[0,100]),showlegend=False,margin=dict(t=40,b=10))
In general, spanish-american people tend to listen tracks with non-explicit lyrics (59.1%). However, specifically in Chile, they tend to mostly listen explicit tracks (58%), as opposed to Argentina (18%). In addition, Chile is the only country where they listen to more explicit than non-explicit tracks.
In [27]:
analysis_features = ['explicit','danceability','energy','loudness','time_signature']
features_hists = make_subplots(rows=2,cols=3,subplot_titles=analysis_features)
for i, feature in enumerate(analysis_features):
row = i // 3 + 1
col = i % 3 + 1
features_hists.add_trace(go.Histogram(x=tracks_df[feature],name=feature,marker=dict(color='#1db954'),hovertemplate='(%{x}, %{y})<extra></extra>'),row=row,col=col)
features_hists.update_layout(font_color='#535353',title_text='<b>Selected Features Distribution',title_x=0.5,showlegend=False)
features_hists
These are the features that come from the three highest correlations, as mentioned above. Here we can see that "danceability", "energy", and "loudness" have a left-skewed distribution. This means that very few trendings tracks are not very moving.
In [ ]:
analysis_correlations = [('loudness','energy'),('time_signature','loudness'),
('explicit','danceability')]
x,y = analysis_correlations[0]
corr_scatter = px.scatter(tracks_df,x=x,y=y,color=x,trendline='ols',trendline_color_override='#1db954',color_continuous_scale=['#535353','#1db954'],title=f'<b>{x.capitalize()} vs {y.capitalize()}')
corr_scatter.update_layout(font_color='#535353',title_x=0.5)
When a track has more dBs, it is percieved as being faster and louder.
In [29]:
x,y = analysis_correlations[1]
corr_scatter = px.box(tracks_df,x=x,y=y,color=x,color_discrete_sequence=['#1db954','#1db954'],title=f'<b>{x.capitalize()} vs {y.capitalize()}')
corr_scatter.update_layout(font_color='#535353',title_x=0.5,showlegend=False)
We can´t see much here, due to the most of the tracks have a stimated 6/4 time signature. Even so, we can see that a large portion of these have a high dB level (meadian = -4.783) , and variability (max = 0.02, min = -14.178).
In [30]:
x,y = analysis_correlations[2]
temp_tracks_df = tracks_df
temp_tracks_df['explicit'] = temp_tracks_df['explicit'].replace({1:'Explicit',0:'Not Explicit'})
corr_scatter = px.box(temp_tracks_df,x=x,y=y,color=x,color_discrete_sequence=['#1db954','#535353'],title=f'<b>{x.capitalize()} vs {y.capitalize()}')
corr_scatter.update_layout(font_color='#535353',title_x=0.5,showlegend=False)
Tracks with explicit lyrics are more suitable for dancing (median = 0.786) than non-explicit ones (median = 0.73). However, non-explicit tracks have a higher possibility of not being suitable for dancing to the public (q1 = 0.635, min = 0.373).
Conclusions
We can conclude that:
- The top three trending tracks are: "Die with a Smile - Lady Gaga", "Birds of a Feather - Billie Eilish", and "Si Antes te Hubiera Conocido - Karol G".
- Karol G is the artist that has the highest average popularity, with just four tracks. while Engel Montaz has the lowest average, with seven trending tracks.
- Most of the trending tracks in spanish-american countries have non-explicit lyrics. However, in Chile most of trending tracks have explicit lyrics (and the only country with this tendency).
- The higher the dB level, the higher the track's energy.
- Most of the trending tracks tend to have a 6/4 time signature.
- Tracks with explicit lyrics tend to be suitable for dancing.
- The top three trending tracks are: "Die with a Smile - Lady Gaga", "Birds of a Feather - Billie Eilish", and "Si Antes te Hubiera Conocido - Karol G".
- Karol G is the artist that has the highest average popularity, with just four tracks. while Engel Montaz has the lowest average, with seven trending tracks.
- Most of the trending tracks in spanish-american countries have non-explicit lyrics. However, in Chile most of trending tracks have explicit lyrics (and the only country with this tendency).
- The higher the dB level, the higher the track's energy.
- Most of the trending tracks tend to have a 6/4 time signature.
- Tracks with explicit lyrics tend to be suitable for dancing.